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SEQDPNO:l 

TCNNATTCCCGAGAATAAATTTCTGTGACT^ 
TAACATACCCAAGCTGCCTCTGCCTCCCGCA^ 
CCCCACCTGCCCCACACATCCTCCCCTATGra^ 

GTGGGTATATTGGNCTCATTGAGACTGCAGGCCCTTGGAGGGCAGGCTCTGACCTGCAG^ 

TACTCAGCACACANTAGGTGGATAAATACCCCCACAGTAGGTGGGTAGTGA 

CCATCTACATGGGCANAGCCTGCTTTAAGCGTGGGT^ 

ATCTAC^CAAGTCCATCCTCAGCTCTTCCACTCCCGGGTTCCCTCCTGGACCTGTGTGACTCT 
GAATTCCTAACCTCCCCTTTCAACTGAGCCCTT^ 

AACAAGGGGACTGTGTCTGTGGCTGGATGACTCATGCACACTGCTCCATCCCGCAATCTTGGGCGGGACTTGGGC 
TGGGGAGGATGCCAGCCAGCTCAGGCTAGGAGCTTGCATCCTGTTGCCCCAACCCAGCCCTACCAGAACAGAGTG 
TACTCAGAGCTCCTVGGACAAAAATCTGGAAACAGAGAGCCGGC^ 

GAGCAGGCAGAGGAAACAGCAAGTTCAAAGTTCCTGAGGTGGGAATGCGCTTGACACAACGGAGACCTGAGAAGA 
ACACAGCAAAGGCCGTGTTACATTTGTCTGNGACTCC^GCCCC 

CTCACCTGGATAATCCAGAGCCATGGCCCATNACANGNNTNCTTCTTTTTTTTTTTTC 
TCTTTTTTTGNNNl^GGCCCCAAGACAGGCTTTCTT^ 

AAACTGGCCTGNGAACTCACAGAGATCCTCCTGNCTTTGNCTNCCGAGTACAAGGGTTAAAAGCCTGAGC 

CCACTGGCCAGGCTAACTAAGGTTCTTAACTTTTTAAGNATTATTTTTCTTTCTTATC 

GGGGATGCACAAGGGCATGGGGGGGGGGTCCCTGCAGAAGTCAGAAGAGGTGCCAGATCCCTGGGAGCTC^ 

AAAGTCAGTC^TGAAACATCCAAGATGGACACTGGGNAACTGAACTTGGGTCCTCTGCGAGAGGAGTAATGGTCT 

TAACTGCTGAGCCATCTCTAGGCCCAATGTCTGGT^ 

TTTGTATTTGGGGGTTTTTGTTTGTCTGTTTGGTTC 

GCCCTGGCTGTCCTGGAACTCACTCTATAGACTAGGCTGGCCTCGAACTCAGAAATCCTCCTGCCTCTGCCTCCC 

AAGTGCTGAGATTAAAGGCCCGTGCCACCACTGCCCGACGCCAATGTCTGTATTTTATTCATCTCTGCAGAATCT 

CTTTTGTCTCCTAACGGAACATCATCCCAGATTCTGGGAAGTACACTGAAGACAATGGGGTGGGTGTTGTTTCTC 

TCCTATGCCCTTTACATNCTCCCTACCT^^ 

AAACCTCCCTC 
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Alignment of sequences of two RapR71 and RapR72 . 



71 
72 



consensus 1 



AGNAAGGAGAGGTAGGGTCAACACTGATTTCTGGCTTCSAGaATTCCBoAGAATAAATTT 



* * ***** ************ 



TQSMBATTCCBGAGAATAAATTTi 



71 
72 

consensus 61 ************************************ * ******************** 



ctgtgactaactcttccttttgttggttcttcatggmai 
ctgtgactaa^ctcttccttttgttggttcttcatggSa^ 



71 
72 



121 
83 



cccaagctgcctctgcctcccgcagtgaacccctaccctgccctttggcaggttctctto 
cccaagctgcctctgcctcccgcagtgaacccctaccctgccctttggcaggttctctt^ 



consensus 121 ************************************************************ 



71 
72 



181 
143 



ctgaccatccccacctgccccacacatcctcccctatgcaccccaactStgagcccctcc 
ctgaccatccccacctgccccacacatcctcccctatgcaccccaact»Stgagcccctcc 



consensus 181 ************************************************ ********* 



71 
72 



241 
203 



tgctcagtaagtctgtagacttggtgggtatattgggctcattgagactgcaggcccttg 
tgctcagtaagtctgtagacttggtgggtatattggSctcattgagactgcaggcccttg 



consensus 241 ************************************ ********************** 



71 
72 



301 
263 



gagggcaggctctgacctgcagtaagatgtgtgagtgatactcagcacacaStaggtgga 
gagggcaggctctgacctgcagtaagatgtgtgagtgatactcagcacacaStaggtgga 



consensus 301 *************************************************** ******** 



71 
72 



361 
323 



taaatacccccacagtaggtgggtagtgagccctgtgagtccactgtaagfflaccatcta 
taaatacccccacagtaggtgggtagtgagccctgtgagtccactgtaagBcaccatcta 



consensus 361 ************************************************** ******** 



71 
72 



420 
383 



catgggca^gcctgctOTaagcgtgggtta^ggacacaacagtttSttcagagggcttc 
catgggcaBagcctgct«taagcgtgggtta5ggacacaacagtttSttcagagggcttc 



consensus 421 ******** ******** ,************ ************** **** 



********* 



tggcaccat^tacacaagccatnctcagStKttccact«ccgggttPccctSctggacc 
tggcaccatmtacacaagccatnctcagStSttccactSccgggtttccctBctggacc 



consensus 481 ********* .****************** * ******* ******* **** ******* 



71 
72 



479 
443 



TGTGTGACTCTGAGGgAflCTTGGGGAATTigCTgAgCCTOCCCTTTCAACTGgACCCTTGG 
TGTGTCACTC^ 



consensus 541 *************** *.*********** ** * *** ************ ******* 



71 
72 



538 
502 



71 
72 

consensus 601 ** **** * .* *****.****.**** *★** *** ********** ****** 



GNgCCCggGAACAANGGGHACTGTG 

gmScccHScaacaanggggactgtg 
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SEQIDNO:2 

ATGAGTGAGTCTATACTCACAGGCACTGAGAAAGCCAGACTCAACGGCTA 

CCTCCTCCAAGATGTAACCATGATCTACCAGCTCATCACAGGCCACAGCT 

TAAACCTCCCTCCCCTTTGTCACATCTCCACCATCAACCACACCCTTCCA 

TCTTTCTCTTCATCTGACACATATCITCCAACCCTTCAGTCATCTAATAA 

GCAGACTTTAAAAGCCACGGGTCCTGGATATCCAATGGAAAATGACCAAA 

GGAAGAACACTTGCTCCTTAGTCCGACAAGAAGGTTTCAAAGGAGTC^ 

TTGCATGCTGAAGCACTTCCCACAGAAGGAGCACCCCCCCCCCCACCTCA 

TCTGCAGGATTCCGAGATGGAAGAGAAGAGGCGAAAATATTCCATCAGCA 

GCGACAACTCTGATACCACTGACGGTCACGTGACATCCACATCAGCATCA 

AGATGTTCCAAACTGCCCAGCAGCACCAAGTCGGGCTGGCCCCGGCAGAA 

CGAGAAGAAGCCCTCAGAGGTTTTCCGGACAGACTTGATCACAGCCATGA 

AGATCCCAGATTCATACCAGCTCAGCCCGGATGACTACTACATCCTGGCG 

GACCCGTGGCGACAAGAATGGGAGAAAGGGGTGCAGGTACCTGCTGGAGC 

GGAGGCCATTCCAGAGCCTGTGGTGAGGCTCCTCCCACCACTGAAAGGCC 

CCCCCACGCAGATGTCCCCAGATAGCCCCACACTTGGTGAGGGTGCCCAT 

CCTGACTGGCCAGGAGGCAGCCGCTACGACCTGGATGAGATCGATGCGTA 

CTGGTTGGAACTTCTCAACTCGGAGCTCAAGGAGATGGAGAAGCCCGAGC 

TGGATGAGCTAACGTTAGAGCGTGTTCTAGAGGAGCTAGAGACATTGTGC 

CACCAGAATATGGCACAGGCCATTGAGACACAGGAGGGGCTGGGCATCGA 

GTACGACGAGGACGTTGTCTGCGACGTGTGCCGTTCCCCTGAAGGCGAGG 

ATGGCAACGAGATGGTCTTCTGTGACAAATGCAATGTCTGTGTGCACCAG 

GCATGCTACGGGATCCTCAAGGTGCCTACGGGCAGCTGGCTGTGCCGGAC 

CTGTGCCCTGGGAGTCCAGCCTAAGTGCCTGCTCTGCCCCAAGCGAGGAG 

GAGCCCTGAAGCCCACTAGAAGTGGGACCAAGTGGGTACACGTCAGCTGT 

GCCCTGTGGATTCCTGAGGTCAGCATTGGCTGTCCAGAGAAGATGGAGCC 

CATTACCAAGATCTCGCATATTCCGGCCAGCCGCTGGGCCCTGTCCTGCA 

GCCTCTGCAAGGAGTGCACAGGTACCTGCATCCAGTGTTCCATGCCTTCC 

TGCATCACAGCATTCCACGTTACGTGCGCCTTTGACCGAGGCCTGGAAAT 

GCGGACTATATTAGCTGACAATGACGAGGTCAAGTTCAAGTCACTTTGCC 

AGGAGCACAGTGACGGGGGCCCTCGGAGTGAGCCTACTTCTGAGCCTGTG 

GAGCCCAGCCAGGCCGTTGAGGATCTGGAAAAGGTGACCTTACGCAAGCA 

GCGGCTGCAGCAGCTGGAAGAAAACTTCTATGAGCTAGTGGAGCCAGCTG 

AGGTGGCTGAACGGCTAGACCTGGCTGAGGCACTGGTGGACTTCATCTAC 

CAGTACTGGAAGTTGAAGCGGAGAGCTAATGCCAACCAGCCGCTGTTGAC 

GCCCAAGACTGACGAGGTGGACAACCTGGCCCAACAGGAACAGGATGTCC 

TCTATCGACGCCTGAAGCTTTTCACCCACCTGCGGCAGGACCTGGAGAGG 

GTAAGGAAC CTGTG CTAC ATGGTGACAAGACGGGAGAGAACGAAACAC AC 

CATCTGTAAACTTGAGGAGCAGATATTCCATCTACAGATGAAACTTATTG 

AGCAAGACCTTTGCAGAGAGCCTTCTGGGAGGAGGTCAAAGGGCAAGAAG 

AATGATTCAAAAAGGAAAGGCCGAGAGGGTCCCAAGGGCAGCAGCCCTGA 

GAAGAAAGAGAAAGTGAAGGCTGGGCCCGAGTCTGTGCTGGGGCAGCTGG 

GTCTATCCACCTCGTTCCCCATCGACGGCACTTTCTTCAACAGCTGGTTG 

GCACAGTCGGTTCAGATCACAGCAGAGGACATGGCCATGAGCGAGTGGTC 

TTTGAACAGTGGGCACCGGGAGGATCCTGCTCCAGGTCTGCTGTCAGAGG 

AATTGCTACAAGATGAGGAGACGCTGCTCAGCTTCATGAGGGACCCCTCG 

CTACGACCTGGTGACCCTGCCAGAAAGGCCCGAGGCCGCACTCGCCTGCC 

TGCCAAGAAGAAACCATCCCCGCTGCAGGATGGGCCCAGTGCACGGACCA 

CTC CAGAC AAGCAACCCAAGAAGGC CTGGGC C CAGGATGGCAAGGGGACG 

CAAGGAC CACC CATGAGGAAGCCAC CACGGAGGACGTC TTCTCATTTGCC 

GTCCAGCCCTGCAGCTGGGGACTGTCCAGTCCCAGCAACACTGGAAAGCC 

CTCCACCACTGGCCTCCGAGATACTAGACAAGACAGCCCCCATGGCTTCC 

GACTTAAATGTCCAAGTGCCTGGCCCTACAGTGAGCCCCAAACCCTTGGG 

CAGGCTCCGGCCACCCCGAGAGATGAAGGTCAGTCGGAAATCTCCGGGTG 

CTAGATCCGATGCTGGGACAGGACTACCGTCTGCTGTGGCCGAGAGGCCA 

AAGGTCAGCCTGCATTTTGACACCGAGGCTGACGGCTACTTCTCTGATGA 

GGAGATGAGCGATTCTGAGGTAGAGGCAGAGGACAGTGGGGTACAACGAG 

CTTCCAGGGAGGCAGGGGCAGAGGAGGTGGTTCGCATGGGGGTGCTGGCC 

TCCTAA 
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SEQIDNO:3 

MSESILTGTEKARLNGYLLQDVTMIYQLITGHSLNLPPLCHISTINHTLP 
S FS S SDTYLPTLQS SNKQTLKATGPGYPMENDQRKNTCSLVRQEGFKGVT 
LHAEALPTEGAPPPPPHLQDSEMEEKRRKYS I S SDNSDTTDGHVTSTSAS 
RCSKLPSSTKSGWPRQNEKKPSEVFRTDLITAMKIPDSYQLSPDDYYILA 
DPWRQEWEKGVQVPAGAEAI PE PWRLLPPLKGPPTQMS PDS PTLGEGAH 
PDWPGGSRYDLDEIDAYWLELLNSELKEMEKPELDELTLERVLEELETLC 
HQNMAQAI ETQEGLGI E YDED WCDVCRS PEGEDGNEMVF CDKCNVCVHQ 
ACYGILKVPTGSWLCRTCALGVQPKCLLCPKRGGALKPTRSGTKWVHVSC 
ALWIPEVSIGCPEKMEPITKISHIPASRWALSCSLCKECTGTCIQCSMPS 
CITAFHVTCAFDRGLEMRTILADNDEVKFKSLCQEHSDGGPRSEPTSEPV 
EPSQAVEDLEKVTLRKQRLQQLEENFYELVEPAEVAERLDLAEALVDFIY 
QYWKLKRRANANQPLLTPKTDEVDlSn^QQEQDVLYRRLKLFTHLRQDLER 
VRNLCYMVTRRERTKHTI CKLQEQI FHLQMKLI EQDLCREPSGRRSKGKK 
3STDSKRKGREGPKGSSPEKKEKVKAGPESVLGQLGLSTSFPIDGTFFNSWL 
AQSVQITAEDMAMSEWSLNSGHREDPAPGLLSEELLQDEETLLSFMRDPS 
LRPGDPARKARGRTRLPAKKKPSPLQDGPSARTTPDKQPKKAWAQDGKGT 
QGPPMRKPPRRTSSHLPSSPAAGDCPVPATLESPPPLASEILDKTAPMAS 
DLNVQVPGPTVSPKPLGRLRPPREMKVSRKSPGARSDAGTGLPSAVAERP 
KVSLHFDTEADGYFSDEEMSDSEVEAEDSGVQRASREAGAEEVVRMGVLAS 

SEQ ID NO:4 

MEEKRRKYSISSDNSDTTDGHVTSTSASRCSKLPSSTKSGWPRQNEKKPS 

EVFRTDL I TAMKI PDS YQLS PDDYYI LADPWRQEWEKGVQVPAGAEAI PE 

PWRLLPPLKGPPTQMSPDSPTLGEGAHPDWPGGSRYDLDEIDAYWLELL 

NSELKEMEKPELDELTLERVLEELETLCHQNMAQAIETQEGLGIEYDEDV 

VCDVCRS PEGEDGNEMVFCDKCNVCVHQACYGI LKVPTGS WLCRTCAI^GV 

QPKCLLCPKRGGALKPTRSGTKWVHVSCT^LWI PEVS I GCPEKMEPI TKI S 

HIPASRWALSCSLCKECTGTCIQCSMPSCITAFHVTCAFDRGLEMRTILA 

DNDEVKFKSLCQEHSDGGPRSEPTSEPVEPSQAVEDLEKVTLRKQRLQQL 

EENFYELVEPAEVAERLDLAEALVDFIYQYWKLKRRANANQPLLTPKTDE 

VDNLAQQEQDVLYRRLKLFTHLRQDLERVRNLCYMVTRRERTKH 

EQIFHLQMKLIEQDLCREPSGRRSKGKKNDSKRKGREGPKGSSPEKKEKV 

KAGPESVLGQLGLSTSFPIDGTFFNSWLAQSVQITAEDMAMSEWSLNSGH 

REDPAPGLLSEELLQDEETLLSFMRDPSLRPGDPARKARGRTRLPAKKKP 

S PLQDGP S ARTTPDKQPKKAWAQDGKGTQGPPMRKPPRRTS SHLPS S PAA 

GDCPVPATLES PPPLASE I LDKTAPMASDLNVQVPGPTVS PKPLGRLRPP 

REMKVSRKSPGARSDAGTGLPSAVAERPKVSLHFDTEADGYFSDEEMSDS 

E VEAED S G VQRAS REAGAEE WRMGVLAS 
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SEQIDNO:5 

gttttaaaaagaaacagaaacatacacagggggtt 3 c f tgaatggtgcc9accgcggccatcgcagttggaggctattttttgggggggatgga 
agagaagaggcgaaaatactccatcagcagtgacaactctgacaccactgacagtcatgcgacatctacatccgcatcaagatgctccaaact 
gcccagcagcaccaagtcgggctggccccgacagaacgaaaagaagccctccgaggttttccggacagacttgatcacagccatgaagatccc 
ggactcataccagctcagcccggatgactactacatcctggcagacccatggcgacaggaatgggagaaaggtgtgcaggtgcctgccggggc 
agaggccatcccagagcccgtggtgaggatcctcccaccactggaaggcccccctgcccaggcatccccgagcagcaccatgcttggtgaggg 
ctcccagcctgattggccagggggcagccgctatgacttggacgagattgatgcctactggctggagctcatcaactcggagcttaaggagat 
ggagaggccggagctggacgagctgacattagagcgtgtgctggaggagctggagaccctgtgccaccagaatatggccagggccattgagac 
gcaggaggggctgggcatcgagtacgacgaggatgttgtctgcgacgtgtgtcgctctcctgagggcgaggatggcaacgagatggtcttctg 
tgacaagtgcaacgtctgtgtgcatcaggcatgctacgggatcctcaaggtgcccacgggcagctggctgtgccggacgtgtgccctgggtgt 
ccagccaaagtgcctgctctgccccaagcgaggaggagccttgaagcccactagaagtgggaccaagtgggtgcatgtcagctgtgccctatg 
gattcctgaggtcagcatcggctgcccagagaagatggagcccatcaccaagatctcgcatatcccagccagccgctgggctctgtcctgcag 
cctctgcaaggaatgcacaggcacctgcatccagtgttccatgccttcctgcgtcacagcgttccatgtcacatgcgcctttgaccacggcct 
ggaaatgcggactatattagcagacaacgatgaggtcaagttcaagtcattctgccaggagcacagtgacgggggcccacgtaatgagcccac 
atctgagcccacggaacccagccaggctggcgaggacctggaaaaggtgaccctgcgcaagcagcggctgcagcagctagaggaggacttcta 
cgagctggtggagccggctgaggtggctgagcggctggacctggctgaggcactggtcgacttcatctaccagtactggaagctgaagaggaa 
agccaatgccaaccagccgctgctgacccccaagaccgacgaggtggacaacctggcccagcaggagcaggacgtcctctaccgccgcctgaa 
gctcttcacccatctgcggcaggacctagagagggttagaaatctgtgctacatggtgacaaggcgcgagagaacgaaacacgccatctgcaa 
actccaggagcagatattccacctgcagatgaaacttattgaacaggatctgtgtcgagagcggtctgggaggagagcaaagggcaagaagag 
tgactcgaagaggaagggctgcgagggctccaagggcagcactgagaagaaagagaaagtgaaggcggggcctgactcagtcctggggcagct 
ggcaggcctgtccacctcattccccatcgatggcaccttcttcaacagctggctggcacagtcggtgcagatcacagcagagaacatggccat 
gagcgagtggccactgaacaatgggcaccgcgaggaccctgctccagggctgctgtcagaggaactgctgcaggacgaggagacactgctcag 
cttcatgcgggacccctcgctgcgacctggtgaccctgctaggaaggcccgaggccgcacccgcctgcctgccaagaagaaaccaccaccacc 
accaccgcaggacgggcctggttcacggacgactccagacaaagcccccaagaagacctggggccaggatgcaggcagtggcaaggggggtca 
agggccacctaccaggaagccaccacgtcggacatcttct:cacttgccgtccagccctgcagccggggactgtcccatcctagccacccctga 



agctgtggctgagaggcccaaggtcagcctgcattttgacactgagactgatggctacttctctgatggggagatgagcgactcagatgtaga 
ggccgaggacggtggggtgcagcggggtccccgggaggcaggggcagaggaggtggtccgcatgggcgtactggcctcctaactcaccccctt 



HG.3A 



WO 2004/020581 PCT/US2003/026073 

6/35 

SEQIDNO:6 

MVPTAAIAVGGYFLGGI^EKRRKYSISSDNSDTTDSHATST^ 

AMKIPDSYQLSPDDYYILADPWRQEWEKGVQVPAGAEAIPEPVVRILPPLEGPPAQASPSSTMLGEGSQPDWPGG 
SRYDLDEIDAYWLELINSELKEMERPELDELTLERVLEEI^ 
GEDGNEMWCDKCOTCVHQACYGILKOTTGSWL^ 
VSIGCPEKMEPITKISHIPASRWALSCSLCKECTGTCIQCSOT 

FCQEHSDGGPRNEPTSEPTEPSQAGEDLEKVTLRKQRLQQLEEDFYEIjVEPAEVAERLDIjAEALVDFIYQYW 

rkananqplltpktdevdnlaqqeqdvlyrrl 
lieqdlcrersgrrakgkksdskrkgcegskgstekkekvkag 
taenmamsewplnnghredpapgllseellqdeet^^ 
gsrttpdkapkktwgqdagsgkggqgpptrkpprrtsshlpsspaagdcpi^ 

SDVQVPGPAASPKPXjGRLRPPRESKVTRRLPGARPDAGMGPPSAVAERPKVSLHFDTETDGYFSDGEMSDSDV^ 

edggvqrgpreagaeewrmgvlas 
SEQ ID NO:7 

meekrrkysissdnsdttdshatstsasrcsklpsstksgwprqi^kkpsevfrtdlitam 
iladpwrqevmkgvqvpagaeaipepvvrilpplegppaqaspsstmlgegsqpdwpggsrydldeiday^ 
nselkemerpeldeltlervleeletlchqnmaraietqeglgieyd^ 
vhqacygilkvptgswlcrtcalgvqpkcllcpk^ 

HIPASRWALSCSLCKECTGTCIQCSMPSCCTAFHVT 
EPTEPSQAGEDLEKVTLRKQRLQQLEEDFYELVEPAEVAERIjDI^^ 

VDNLAQQEQDVLYRRLKLFTHLRQDLERVRNLCYMVTRRERTKHAI CKLQEQ I FHLQMKL I EQDLCRERS GRRAK 

GKKSDSKRKGCEGSKGSTEKKEKVKAGPDSVLGQI^GLSTSFP 

REDPAPGLLSEELLQDEETLLSFMRDPSLRPGDPARKARGRTI^ 

DAGSGKGGQGPPTRKPPRRTSSHLPSSPAAGDCPILATPESPPPLAPETPDEAASVAADSDVQVPGPAASPKPLG 
RLRPPRESKVTRRLPGARPDAGMGPPSAVAERPKVSLHFDTETDGYFSDGEMSDSDVEAEDGGVQRGPREAGAEE 
WRMGVLAS 
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Exon 1 (SEQ ID NO:8) 

GGGGGTTGGTGAATGGTGCCGACCGCGGCCATCGCAGTTGGAGGCT 

GGAGTTACTTTGCGCCCACTCCTAGC^GCACCGGCTTAGGTCCTGCGGGCCGACCGTCCCCGGCGGGGGG 
GGCCTGGGACGCCGCGGGCCCGGCCGCCTCCCTCGCCGCGACCCCGGATGGATGCGCGCCCCCCGCCCTCCCGCG 
CCGGCCCCAGGAGCTCCCGGCTTCGGGAGCATCCTTCCCGCGCCGGTCCCTGCAGCGGCGCGTAGCCGAGGGCAG 
CGCCCGTCAGGGGGGCACCGCGGAGCAAG 

Exon2(SEQEDNO:9) 

ATGGAAGAGAAGAGGCGAAAATACTCCATCAGCAGTGACAACTCTGACACCACTGACA 
Exon 3 (SEQ ffi NO; 10) 

GTGATGCGACATCTACATCCGCATCAAGATGCTCCAAACTGCCCAGC^GCACC^GTCGGGCTGGCCCCGACAGA 
ACGAAAAGAAGCCCTCCGAG 

Exon 4 (SEQ m NO: 11) 

GTTTTCCGGACAGACTTGATCACAGCCATGAAGATCCCGGACTCATACCAGCTCAG 

CTGGCAGACCCATGGCGACAGGAATGGGAGAAAGGTGTGCAGGTGCCTGCCGGGGCAGAGGCCATCCCAGAGCCC 
GTGGTGAG 

Exon 5 (SEQ ID NO: 12) 

GATCCTCCCACCACTGGAAGGCCCCCCTGCCCAGGC^ 

TGATTGGCCAGGGGGCAGCCGCTATGACTTGGACGAGATTGATGCCTACTGGCTGGAGCTCATCAACTCGGAGCT 
TAAGGAGATGG 

Exon6(SEQIDNO:13) 

AGAGGCCGGAGCTGGACGAGCTGACATTAGAGCGTGTGCTGGAGGAGCTGGAGACCCTGTGCCACCAGAATATGG 
CCAGGGCCATTGAGACGCAGGAGGGGCTGGGCATCGAGTACGACGAGGATGTTGTCTGCGACGTGTGTCGCTCTC 
CTGAGGGCGAGGATGGCAACGAGATGK3TCTTCTGTGACAAGTGCAACGTCTGTGTGCATCAG 

Exon 7 (SEQ ED NO: 14) 

GCATGCTACGGGATCCTCAAGGTGCCCACGGGGA^ 

TGCCTGCTCTGCCCCAAGCGAGGAGGAGCCTTGAAGCCCACTAGAAGTGGGACCAAGTGGGTGCATGTCAGCTGT 
GCCCTATGGATTCCTGAG 

Exon 8 (SEQ ID NO: 15) 

GTCAGCATCGGCTGCCCAGAGAAGATGGAGCCCATCACCAAGATCTCGCATATCCCAGCCAGCCGCTGGGCTCTG 
TCCTGCAGCCTCTGCAAGGAATGCACAGGCACCTGCATCCAG 

Exon 9 (SEQ ID NO: 16) 
TGTTCCATGCCTTCCTGCGTCACA^ 

TTAGCAGACAACGATGAGGTC^GTTC^GTCATTCTGCCAGGAGCACAGTGACGGGGGCCCACGTAATGAGCCC 

ACATCTGAGCCCACGGAACCCAGCCAGGCTGGCGAGGACCTGGAAAAGGTGACCCTGCGCAAGCAGCGGCTGCAG • 

CAGCTAGAGGAGGACTTCTACGAGCTGGTGGAGCCGGCTGAGGTGGCTGAGCGGCTGGACCTGGCTGAGGCACTG 

GTCGACTTCATCTACCAGTACTGGAAGCTGAAGAGGAAAGCCAATGCCAACCAGCCGCTGCTGACCCCCAAGACC 

GACGAGGTGGACAACCTGGCCCAGCAGGAGCAGGACGTCCTCTACCGCCGCCTGAAGCTCTTCACCCATCTGCGG 

CAGGACCTAGAGAGG 

Exon 10 (SEQ ID NO: 17) 

GTTAGAAATCTGTGCTACATGGTGACAAGGCGCGAGAGAACGAAACACGCCATCTGCAAACTCCAGGAGCAGATA 
TTCCACCTGCAGATGAAACTTATTGAACAGGATCTGTGTCGAG 
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Exonll(SEQIDNO:18) 

GCCTGTCCACCTGATTCCCCATCGATGGC^^ 

AGAACATGGCCATGAGCGAGTGGCCACTGAACAATGGGCACC^ 

AACTGCTGCAGGACGAGGAGACACTGCTCAGCTTCATGCGGGACCC^ 

AGGCCCGAGGCCGC^CCCGCCTGCCTCCCAAGAAGAAAC 

GGACGACTCCAGACAAAGCCCCCAAGAAGACCTGGGGCCAGGA 

CTACCAGGAAGCCACCACGTCGGACATCTTCTC^^ 

CC^CCCCTGAAAGCCCCCCGCCACTGGCCCCTGAGACCCCG^ 

TCCAAGTGCCTGGCCCTGCAGCAAGCCCTAAGCCTTTGGGCCGGCTCCGGCCACCCTCCGAGAGCAAGGT 
GGAGATTGCCGGGTGCCAGGCCTGATGCTGGGATGGGACCACCTTCAGCTGTGGCTGAGAGGCCCAAGG 
TGCATTTTGAGACTGAGACTGATGGCTACTTCTCTGATGGGGAGATGAGCGACTCAGATGTAGAGGCCGAGGACG 
GTGGGGTGCAGCGGGGTCCCCGGGAGGCAGGGGCAGAGGAGGTGGTCCGCATGGGCGTACTGGCCTCC 
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SEQ ID NO: 19 



Promoter and regulatory region: 

cggccctggggacagggcgggctaggggcgccccagagtccatggggagtccgggcccagggtgccagcaggcgt 

ggtggtggggctgcgagggagggcacccttcccccacggggcccgcaacgctacctggactccccgccggagcca 

aacaactgggcggggggttgggggggcggcgacgggggtgtcgggagcggagatccgagtgaataagaaaaaagt 

ggctactccccctccctcgctcctcctgcccgcccccaccccacccccaccccaacacattttttttttctaaag 

agatcacaaggaagtcttggtttaaaaagaaacagaaacatacacagggggttggtgaatggtgccgaccgcggc 

catcgcagttggaggctattttttggggggggtgagtagcgtccatggagttactttgcgcccactcctagcggc 

accggcttaggtcctgcgggccgaccgtccccggcggggggcgtggggcctgggacgccgcgggcccggccgcct 

ccctcgccgcgaccccggatggatgcgcgccccccgccctcccgcgccggccccaggagctcccggcttcgggag 

catccttcccgcgccggtccctgcagcggcgcgtagccgagggcagcgcccgtcaggggggcaccgcggagcaag 

gtaagatccagcccccggcggatgggccctgcgcatctccacgacgttatttggcgtttttgcaacagatctgcc 

agcgctcttcgctccctcgctctctcttgctcgctcgctccctctctctcctgctggctgcctgttctaggaagc • 

cagcgcggagaggggggggatgcacagcacaggggagagagattgcgcatgttggtcagtcgtgttttaaagagt ■ 

acagtgcggggaggctgagaggggcgcatgcaacaacaacttttggaagggtgagcttggcgaccttctttatta 

atgactgcggcaaagcgcccccgggccggcgagggggcgcgggcgggcgggggcgcgccagggctgcaacttccc 

cgcgggctccggccgggcgtaggggctgcggcgggagatgggtacggtggggaggtcgagcggcccgggcggggg 

ctccgagaacctggagctatctgccctcctgtctccccgagtttcattttgttgatacgcagcacgtccgggcgc 

cgaaccgggctgagccggtgcacatgacctcgcgctgggctcacgtgcagccggtccggtcccagacaccttccg 

9g9g ccacc 9 cctcc 9 cc ctgtcgccccctctcccggcccggtgcacgcgggcgctgcacgcgggggcagcatgc 

tcggctcctggggttggaggctctgcacaaattagacagtttttttggaggggcgggggacaccctttccaggtg 

agtgtggagggtgcg 
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SEQIDNO: 19 



Promoter 2.0 Prediction Results 

cggccctggggacagggcgggctaggggcgccccagagtccatggggagtccgggcccag 
ggtgccagcaggcgtggtggtggggctgcgagggagggcacccttcccccacggggcccg 
caacgctacctggactccccgccggagccaaacaactgggcggggggttgggggggcggc 
gacgggggtgtcgggagcggagatccgagtgaataagaaaaaagtggctactccccctcc 
ctcgctcctcctgcccgcccccaccccacccccaccccaacacattttttttttctaaag 
agatcacaaggaagtcttggtttaaaaagaaacagaaacatacacagggggttggtgaat 
ggtgccgaccgcggccatcgcagttggaggctattttttggggggggtgagtagcgtcca 
tggagttactttgcgcccactcctagcggcaccggcttaggtcctgcgggccgaccgtcc 
ccggcggggggcgtggggcctgggacgccgcgggcccggccgcctccctcgccgcgaccc 
cggatggatgcgcgccccccgccctcccgcgccggccccaggagctcccggcttcgggag 
catccttcccgcgccggtccctgcagcggcgcgtagccgagggcagcgcccgtcaggggg 
gcaccgcggagcaaggtaagatccagcccccggcggatgggccctgcgcatctccacgac 
gttatttggcgtttttgcaacagatctgccagcgctcttcgctccctcgctctctcttgc 
tcgctcgctccctctctctcctgctggctgcctgttctaggaagccagcgcggagagggg 
ggggatgcacagcacaggggagagagattgcgcatgttggtcagtcgtgttttaaagagt 
acagtgcggggaggctgagaggggcgcatgcaacaacaacttttggaagggtgagcttgg 
cgaccttctttattaatgactgcggcaaagcgcccccgggccggcgagggggcgcgggcg 
ggcgggggcgcgccagggctgcaacttccccgcgggctccggccgggcgtaggggctgcg 
gcgggagatgggtacggtggggaggtcgagcggcccgggcgggggctccgagaacctgga 
gctatctgccctcctgtctccccgagtttcattttgttgatacgcagcacgtccgggcgc 
cgaaccgggctgagccggtgcacatgacctcgcgctgggctcacgtgcagccggtccggt 
cccagacaccttccgggggccaccgcctccgccctgtcgccccctctcccggcccggtgc 
acgcgggcgctgcacgcgggggcagcatgctcggctcctggggttggaggctctgcacaa 
attagacagtttttttggaggggcgggggacaccctttccaggtgagtgtggagggtgcg 



PREDICTED TRANSCRIPTION START SITES: 
Sequence, 1440 nucleotides 

Position Score Likelihood 

500 1.072 Highly likely prediction 
1100 0.587 Marginal prediction 

Promoter predictions for 1 eukaryotic sequence with score cutoff 0.80: 
Promoter predictions: 

Start End Score Promoter Sequence 138 188 0.60 

CCCGCCGGAGCCAAACAACTGGGCX3GGGGGTTGGGGGGGCGGCGACGGGG (SEQ IDNO:20) 481 531 0.88 

CCGGCGGGGGGCGTGGGGCCTGGGACGCCGOGGGCCCGGCCGCCTCCCTC (SEQ ID NO: 21) 963 1013 0.98 

AC CT TCTTTATTAATGACTGCGGCAAAG CGCC CCCGGGC CGG CGAGGGGG (SEQ ID NO:22) 992 1042 0.84 
GCCCCCGGGCCGGCGAGGGGGCGCGGGCGGGQ3GGGGCGCGCCAGGGCTG (SEQ ID NO: 23) 
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Alternative Splicing- 1 

Overview 

This gene is defined by 155 cDKA clones and 162 sequences. It is located on chromosome 
5 on the direct strand, from base 140093473 to base 140150963. According to RefSeq 
annotation, its cytogenetic location is 5q31.2. The gene covers 57490 bp of genomic 
DNA. It produces, by alternative splicing, 12 different transcripts a, b, c, d, e, t, 
3# i/ 1/ ]£' 1, altogether encoding 11 proteins. 

http://www.ncbi.nlm.nih.gov/IEB/Reseajch/Acem 
&1=G t5 Hs5 7229 29 18 2019 - # st rue ture 

It contains 23 confirmed introns, 19 of which are alternative. Comparison to the 
genome sequence shows that 18 introns follow the consensual [gt-ag] rule, 1 the less 
f requent consensus [gc-ag] , 2 are atypical with good support [ct_gc] , [ga_ct] 
(provided there is no error in the genome) , 2 are fuzzy or ill defined. 
The gene gives rise to 12 types of transcripts, predicted to encode 11 distinct 

proteins. http://www.ncbi.nlm.nih.gov/^ 
&a=fiche&l=G t5 Hs5 7229 29 18 2019 -# 
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mRNA(s) and Protein (s) 
.Transcript .5' complete Sequence 

variant a 5 , UTR»40bp 
6028bp 

variant b 5.. , UTR-ll62bp 

2997bp \ 7 ' ; . ■" ",; 

variant c S'UTR»339bp 
2870bp 

•variant d 5 , UTR-339b"p 

:2618bp . ..:.v_; 

variant e 5»UTO-339bp 

1350bp 

.variant f • 5 • UTRi3 46bp 

i298Sbp;' ~ : < : :v . 

variant g 5'UTR-339bp 

627 9bp 

variant h S'UTR-339bp • 

7876bp " . : * * 

variant i S»UTR-339bp 
627 fibp 

: variant i 5'UTR-339bp 
6529bp 

variant k no evidence 
647 8bp 
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Alternative Splicing-1 

Intron exon structure and support 
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PROTEIN ANALYSIS 

http://www.ncbi.nto.^ 

he&l=G t5 Hs5 7229 29 18 2019»a - #c once D tuai translation, MW, pi 

The complete protein encoded between the first Met and the stop codon contains 850 
residues. The calculated molecular weight of the protein is 93.6 kDa and isoelectric 
point 5.4. 

http://www.ncbi.nlm.nih.gov/IEB/Research/Acemblv/avxffl 
he&l=G tS Hs5 7229 29 18 2019.a-# 

Predicted cellular localization and motifs (Psort) 

PSORT II analysis, (K. Nakai http : //psort . nibb .ac.jp ) trained on yeast data and run on 
May 26, 2002, predicts that the subcellular location of this protein is most likely in 
the nucleus (69) . Less likely possibilities are in the cytoplasm (17%) or in the 
mitochondria (4%) or in vesicles of secretory system (4%) or in the endoplasmic 
reticulum (4%) . The following domains were found: 
from aa to domain [sequence] 
20 ^23 Nuclear_localization_domain [KRRK] 

163 'l90 Coil_COil_4 ~ [ELINSELKEMERPELDELTLERVLEELE] (SEQ ID NO: 24) 

514 522 2ndjperoximal_domain [KLQEQIFHL] 

548 564 Nuclear_localization domain [KRKGCEGSKGSTEKKEK] (SEQ ID NO: 25) 

549 565 Nuclear_localization~domain ( RKGC EG S KG STE KKEKV J (SEQ ID NO: 26) 
650 656 Nuclear_localization_domain [PARKARG] 

661 667 Nuclear_localization_domain [PAKKKPP] 

663 666 Nuclear~localization~domain [KKKPJ 

http://www.ncbi.nlm.nih.gov^^ 
he&l=G t5 Hs5 7229 29 18 2019.a-# 

Protein family classification (Pfam) 

Pfam analysis ( http i / /pfam . wuatl . edu ) run on May 27, 2002, shows 

a significant hit to the PHD -finger from 217 to 265, with score 62.6 and 

E b 5.2e-16. 

68 other expressed genes in the database also contain this motif 

The PHD finger [MEDLINE : 95216093) , [PUB0 0005675] is a C4HC3 zinc- 
finger- like motif found in nuclear proteins thought to be involved 
in chromat in-mediated transcriptional regulation. The PHD finger 
motif is reminiscent of, but distinct from the C3HC4 type RING 
finger. The function of this domain is not yet known but in analogy 
with the LIM domain it could be involved in protein-protein 
interaction and be important for the assembly or activity of 
multi component complexes involved in transcriptional activation or 
repression. In similarity to the RING finger and the LIM domain, the 
PHD finger is thought to bind two 2inc ions. 

[ 1 Trends Biochem Sci 1995;20:56-59. 

1 = 

[ 2 J MQl Biol 2000;304:723-729. 
]: 

There are also 2 non significant Pfam hits. 

http://www.ncbi.nlm.nih.gov/IEB/Research/Acemblv/av.cgi? 

he&l=G t5 Hs5 7229 29 18 2019.a-# " " ~ 

Protein homologies (BlastP) 

BlastP analysis, run at NCBI on the non redundant database on May 27, 2002, shows 228 
hits with expectancy less than 0.001. Interesting hints from this analysis are: 
score occurences 

1001 1 • PLJ00195 protein 

600 1 FLJ22479 

404 2 bromodomain 
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